智能论文笔记

Rumour detection using graph neural network and oversampling in benchmark Twitter dataset

Shaswat Patel , Prince Bansal , Preeti Kaur

分类：自然语言处理 | 机器学习

2022-12-20

Recently, online social media has become a primary source for new information and misinformation or rumours. In the absence of an automatic rumour detection system the propagation of rumours has increased manifold leading to serious societal damages. In this work, we propose a novel method for building automatic rumour detection system by focusing on oversampling to alleviating the fundamental challenges of class imbalance in rumour detection task. Our oversampling method relies on contextualised data augmentation to generate synthetic samples for underrepresented classes in the dataset. The key idea exploits selection of tweets in a thread for augmentation which can be achieved by introducing a non-random selection criteria to focus the augmentation process on relevant tweets. Furthermore, we propose two graph neural networks(GNN) to model non-linear conversations on a thread. To enhance the tweet representations in our method we employed a custom feature selection technique based on state-of-the-art BERTweet model. Experiments of three publicly available datasets confirm that 1) our GNN models outperform the the current state-of-the-art classifiers by more than 20%(F1-score); 2) our oversampling technique increases the model performance by more than 9%;(F1-score) 3) focusing on relevant tweets for data augmentation via non-random selection criteria can further improve the results; and 4) our method has superior capabilities to detect rumours at very early stage.

translated by 谷歌翻译

Multimodal Wildland Fire Smoke Detection

Siddhant Baldota , Shreyas Anantha Ramaprasad , Jaspreet Kaur Bhamra , Shane Luna , Ravi Ramachandra , Eugene Zen , Harrison Kim , Daniel Crawl , Ismael Perez , Ilkay Altintas

分类：计算机视觉

2022-12-29

Research has shown that climate change creates warmer temperatures and drier conditions, leading to longer wildfire seasons and increased wildfire risks in the United States. These factors have in turn led to increases in the frequency, extent, and severity of wildfires in recent years. Given the danger posed by wildland fires to people, property, wildlife, and the environment, there is an urgency to provide tools for effective wildfire management. Early detection of wildfires is essential to minimizing potentially catastrophic destruction. In this paper, we present our work on integrating multiple data sources in SmokeyNet, a deep learning model using spatio-temporal information to detect smoke from wildland fires. Camera image data is integrated with weather sensor measurements and processed by SmokeyNet to create a multimodal wildland fire smoke detection system. We present our results comparing performance in terms of both accuracy and time-to-detection for multimodal data vs. a single data source. With a time-to-detection of only a few minutes, SmokeyNet can serve as an automated early notification system, providing a useful tool in the fight against destructive wildfires.

translated by 谷歌翻译

Secure and Privacy Preserving Proxy Biometrics Identities

Harkeerat Kaur , Rishabh Shukla , Isao Echizen , Pritee Khanna

分类：计算机视觉

2022-12-21

With large-scale adaption to biometric based applications, security and privacy of biometrics is utmost important especially when operating in unsupervised online mode. This work proposes a novel approach for generating new artificial fingerprints also called proxy fingerprints that are natural looking, non-invertible, revocable and privacy preserving. These proxy biometrics can be generated from original ones only with the help of a user-specific key. Instead of using the original fingerprint, these proxy templates can be used anywhere with same convenience. The manuscripts walks through an interesting way in which proxy fingerprints of different types can be generated and how they can be combined with use-specific keys to provide revocability and cancelability in case of compromise. Using the proposed approach a proxy dataset is generated from samples belonging to Anguli fingerprint database. Matching experiments were performed on the new set which is 5 times larger than the original, and it was found that their performance is at par with 0 FAR and 0 FRR in the stolen key, safe key scenarios. Other parameters on revocability and diversity are also analyzed for protection performance.

translated by 谷歌翻译

Disentangling the Mechanisms Behind Implicit Regularization in SGD

Zachary Novack , Simran Kaur , Tanya Marwah , Saurabh Garg , Zachary C. Lipton

分类：机器学习

2022-11-29

A number of competing hypotheses have been proposed to explain why small-batch Stochastic Gradient Descent (SGD)leads to improved generalization over the full-batch regime, with recent work crediting the implicit regularization of various quantities throughout training. However, to date, empirical evidence assessing the explanatory power of these hypotheses is lacking. In this paper, we conduct an extensive empirical evaluation, focusing on the ability of various theorized mechanisms to close the small-to-large batch generalization gap. Additionally, we characterize how the quantities that SGD has been claimed to (implicitly) regularize change over the course of training. By using micro-batches, i.e. disjoint smaller subsets of each mini-batch, we empirically show that explicitly penalizing the gradient norm or the Fisher Information Matrix trace, averaged over micro-batches, in the large-batch regime recovers small-batch SGD generalization, whereas Jacobian-based regularizations fail to do so. This generalization performance is shown to often be correlated with how well the regularized model's gradient norms resemble those of small-batch SGD. We additionally show that this behavior breaks down as the micro-batch size approaches the batch size. Finally, we note that in this line of inquiry, positive experimental findings on CIFAR10 are often reversed on other datasets like CIFAR100, highlighting the need to test hypotheses on a wider collection of datasets.

translated by 谷歌翻译

SparseVLR: A Novel Framework for Verified Locally Robust Sparse Neural Networks Search

Sawinder Kaur , Asif Salekin

分类：计算机视觉

2022-11-17

The compute-intensive nature of neural networks (NNs) limits their deployment in resource-constrained environments such as cell phones, drones, autonomous robots, etc. Hence, developing robust sparse models fit for safety-critical applications has been an issue of longstanding interest. Though adversarial training with model sparsification has been combined to attain the goal, conventional adversarial training approaches provide no formal guarantee that the models would be robust against any rogue samples in a restricted space around a benign sample. Recently proposed verified local robustness techniques provide such a guarantee. This is the first paper that combines the ideas from verified local robustness and dynamic sparse training to develop `SparseVLR'-- a novel framework to search verified locally robust sparse networks. Obtained sparse models exhibit accuracy and robustness comparable to their dense counterparts at sparsity as high as 99%. Furthermore, unlike most conventional sparsification techniques, SparseVLR does not require a pre-trained dense model, reducing the training time by 50%. We exhaustively investigated SparseVLR's efficacy and generalizability by evaluating various benchmark and application-specific datasets across several models.

translated by 谷歌翻译

Towards Asteroid Detection in Microlensing Surveys with Deep Learning

Preeti Cowan , Ian A. Bond , Napoleon H. Reyes

分类：计算机视觉 | 机器学习

2022-11-04

Asteroids are an indelible part of most astronomical surveys though only a few surveys are dedicated to their detection. Over the years, high cadence microlensing surveys have amassed several terabytes of data while scanning primarily the Galactic Bulge and Magellanic Clouds for microlensing events and thus provide a treasure trove of opportunities for scientific data mining. In particular, numerous asteroids have been observed by visual inspection of selected images. This paper presents novel deep learning-based solutions for the recovery and discovery of asteroids in the microlensing data gathered by the MOA project. Asteroid tracklets can be clearly seen by combining all the observations on a given night and these tracklets inform the structure of the dataset. Known asteroids were identified within these composite images and used for creating the labelled datasets required for supervised learning. Several custom CNN models were developed to identify images with asteroid tracklets. Model ensembling was then employed to reduce the variance in the predictions as well as to improve the generalisation error, achieving a recall of 97.67%. Furthermore, the YOLOv4 object detector was trained to localize asteroid tracklets, achieving a mean Average Precision (mAP) of 90.97%. These trained networks will be applied to 16 years of MOA archival data to find both known and unknown asteroids that have been observed by the survey over the years. The methodologies developed can be adapted for use by other surveys for asteroid recovery and discovery.

translated by 谷歌翻译

Learning Dynamic Abstract Representations for Sample-Efficient Reinforcement Learning

Mehdi Dadvar , Rashmeet Kaur Nayyar , Siddharth Srivastava

分类：机器学习 | 人工智能

2022-10-04

In many real-world problems, the learning agent needs to learn a problem's abstractions and solution simultaneously. However, most such abstractions need to be designed and refined by hand for different problems and domains of application. This paper presents a novel top-down approach for constructing state abstractions while carrying out reinforcement learning. Starting with state variables and a simulator, it presents a novel domain-independent approach for dynamically computing an abstraction based on the dispersion of Q-values in abstract states as the agent continues acting and learning. Extensive empirical evaluation on multiple domains and problems shows that this approach automatically learns abstractions that are finely-tuned to the problem, yield powerful sample efficiency, and result in the RL agent significantly outperforming existing approaches.

translated by 谷歌翻译

Analyzing Wearables Dataset to Predict ADLs and Falls: A Pilot Study

Rajbinder Kaur , Rohini Sharma

分类：机器学习

2022-09-11

医疗保健是人类生活的重要方面。大流行后，在医疗保健中使用技术的流形增加了。文献中提出的基于物联网的系统和设备可以帮助老年人，儿童和成人面临/经历健康问题。本文详尽地回顾了39个基于可穿戴的数据集，这些数据集可用于评估系统以识别日常生活和跌倒活动。使用五种机器学习方法，即逻辑回归，线性判别分析，K-Nearest邻居，决策树和幼稚的贝叶斯对SIFFALL数据集进行比较分析。数据集以两种方式进行修改，首先使用数据集中存在的所有属性，并以二进制形式标记。第二，计算三个轴（x，y，z）的三个轴（x，y，z）的幅度，然后计算出用于标签属性的实验。实验是对一个受试者，十个受试者和所有受试者进行的，并在准确性，精度和召回方面进行比较。从这项研究中获得的结果证明，KNN在准确性，精度和召回方面胜过其他机器学习方法。还可以得出结论，数据个性化提高了准确性。

translated by 谷歌翻译

Generating Coherent Drum Accompaniment With Fills And Improvisations

Rishabh Dahale , Vaibhav Talwadker , Preeti Rao , Prateek Verma

分类：机器学习

2022-09-01

创造像音乐这样的复杂艺术作品需要深刻的创造力。随着深度学习和强大模型（例如变形金刚）的最新进展，自动音乐生成取得了巨大进展。在伴奏的生成环境中，在歌曲中的适当位置创建一个连贯的鼓模式，即使对于经验丰富的鼓手来说，在歌曲中的适当位置也是一项艰巨的任务。鼓节拍倾向于通过填充或即兴表演的节遵循重复的模式。在这项工作中，我们解决了鼓模式产生的任务，该任务是根据四种旋律乐器演奏的音乐来解决的：钢琴，吉他，贝斯和弦乐。我们将变压器序列用于序列模型来生成在旋律伴奏下进行的基本鼓模式，以发现即兴创作在很大程度上不存在，这可能归因于其在训练数据中的预期相对较低的表示。我们提出了一种新颖的功能，以捕获相对于其邻居的标准中即兴创作的程度。我们训练一个模型，以预测旋律伴奏曲目的即兴位置。最后，我们使用一种小说的伯特（Bert）启发的填充体系结构，以学习鼓和旋律的结构，以实现即兴音乐的填充元素。

translated by 谷歌翻译

HTML版本

Fall Detection from Audios with Audio Transformers

Prabhjot Kaur , Qifan Wang , Weisong Shi

分类：机器学习 | 机器人

2022-08-23

老年人的跌倒检测是一些经过深入研究的问题，其中包括多种拟议的解决方案，包括可穿戴和不可磨损的技术。尽管现有技术的检测率很高，但由于需要佩戴设备和用户隐私问题，因此缺乏目标人群的采用。我们的论文提供了一种新颖的，不可磨损的，不受欢迎的和可扩展的解决方案，用于秋季检测，该解决方案部署在配备麦克风的自主移动机器人上。所提出的方法使用人们在房屋中记录的环境声音输入。我们专门针对浴室环境，因为它很容易跌落，并且在不危害用户隐私的情况下无法部署现有技术。目前的工作开发了一种基于变压器体系结构的解决方案，该解决方案从浴室中获取嘈杂的声音输入，并将其分为秋季/禁止类别，准确性为0.8673。此外，提出的方法可扩展到其他室内环境，除了浴室外，还适合在老年家庭，医院和康复设施中部署，而无需用户佩戴任何设备或不断受到传感器的“观察”。

translated by 谷歌翻译